High-throughput chromosome conformation capture (Hi-C) has become the gold standard for studying 3D genome organization. While Hi-C generates rich 2D contact maps, a major goal of many genomic studies is to reconstruct the underlying 3D chromatin structures from this data.
HiSpaR is an R package designed to infer these 3D structures using hierarchical Bayesian modeling and Markov Chain Monte Carlo (MCMC) sampling. Unlike traditional constraint-based methods, HiSpaR utilizes a hierarchical framework to efficiently handle large-scale datasets while providing robust uncertainty quantification for the inferred coordinates.
HiSpaR integrates seamlessly with the HiCExperiment ecosystem. Users can easily import Hi-C data from standard formats (such as .mcool and .hic) into HiCExperiment objects and pass them directly to HiSpaR, streamlining the workflow from raw data to 3D visualization.
# Install HiSpaR from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HiSpaR")
# HiSpaR depends on several packages for Hi-C data handling
# These are automatically installed with HiSpaR:
# - HiCExperiment: For representing Hi-C data
# - Matrix: For sparse matrix handling
# Additionally, install example data package and HiContacts for data manipulation
BiocManager::install("HiContactsData")
BiocManager::install("HiContacts")
# Optional: Install visualization and analysis packages
install.packages("plotly") # For interactive 3D visualization
HiSpaR requires the following Bioconductor and CRAN packages:
library(HiSpaR)
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
library(HiContacts)
#> Loading required package: HiCExperiment
#> Registered S3 methods overwritten by 'readr':
#> method from
#> as.data.frame.spec_tbl_df vroom
#> as_tibble.spec_tbl_df vroom
#> format.col_spec vroom
#> print.col_spec vroom
#> print.collector vroom
#> print.date_names vroom
#> print.locale vroom
#> str.col_spec vroom
library(HiContactsData)
#> Loading required package: ExperimentHub
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:HiCExperiment':
#>
#> as.data.frame
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
library(plotly)
#> Loading required package: ggplot2
#>
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:HiCExperiment':
#>
#> resolution
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:HiCExperiment':
#>
#> export
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
# Load example data from HiCExperiment
cool_file <- CoolFile(HiContactsData::HiContactsData('yeast_wt', format = 'cool'))
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
hic <- import(cool_file, focus = "II:10000-100000")
# visualize the contact matrix
plotMatrix(hic)
# Load example contact matrix (from HiSpaR's built-in data)
data(su1_contact_mat)
cat("Loaded contact matrix dimensions:", dim(su1_contact_mat), "\n")
#> Loaded contact matrix dimensions: 649 649
HiSpaR accepts two types of input:
HiCExperiment::import() (recommended for Bioconductor integration)Both formats are automatically handled by hispa_analyze(), which extracts and processes the contact matrix internally.
HiSpaR provides the main function hispa_analyze() to perform the complete 3D structure inference pipeline:
# Run analysis on the example HiCExperiment object
result_yeast <- hispa_analyze(
hic_experiment = hic,
output_dir = tempdir(),
mcmc_iterations = 1000,
use_cluster_init = TRUE,
mcmc_burn_in = 10,
filter_quantile = 0.1, # Filter out loci in the bottom 10% of contact counts
verbose = TRUE
)
#> Filtering loci by contact count:
#> Original number of loci: 91
#> Threshold ( 0.1 quantile): 311
#> Loci removed: 10
#> Final number of loci: 81
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> Warning: stack imbalance in '<-', 51 then 53
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 81 x 81
#> Output directory: /private/var/folders/5n/zqmvmhlj18xc8zrc4m8cljv80000gn/T/RtmpGI662s
#>
#> Contact matrix set successfully: 81 x 81
#> Skip zero-contact loci: enabled
#> Sample from prior positions: disabled
#>
#> === Pre-processing ===
#> Assigning clusters...
#> Building cluster relationships...
#>
#> === Structure Initialization ===
#> Initializing structure from individual cluster MCMC...
#> Cluster MCMC iterations: 1000
#> Cluster initial SD: 0.1
#> --- Finished Initializing All Cluster and Backbone Structures ---
#> Assembling global structure...
#>
#> --- Assembling Global Structure ---
#> --- Global Assembly Complete ---
#> Initial log-likelihood: 137785
#> Position matrix has shape: 81 x 3
#>
#> === Main MCMC Algorithm ===
#> Running MCMC with 1000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#>
#> --- Starting MCMC Sampling ---
#> Iter 1000/1000. LL: 166012. Best LL: 166012
#> --- MCMC Finished ---
#> Final max log-likelihood found: 166012
#> Beta0 acceptance rate: 22.1%
#> Beta1 acceptance rate: 25.4%
#> Center pos acceptance rate: 22.2667%
#> Locus pos acceptance rate: 23.1951%
#>
#> === Analysis Complete ===
#> All results saved:
#> - final_positions.txt
#> - initial_positions.txt
#> - log_likelihood_trace.txt
#> - block_timings.txt
#> - mcmc_log.txt
#> Warning: stack imbalance in '{', 47 then 49
#> Warning: stack imbalance in '{', 26 then 28
# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_yeast)
#> List of 3
#> $ positions: num [1:81, 1:3] -2.98 -2.76 -2.3 -2.74 -2.33 ...
#> ..- attr(*, "filtered_locus_indices")= int [1:81] 2 3 4 5 6 7 8 9 10 11 ...
#> $ beta0 : num 3.51
#> $ beta1 : num -1.41
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_yeast$positions, 5))
#> [,1] [,2] [,3]
#> [1,] -2.979224 -10.268965 -5.961451
#> [2,] -2.756940 -10.121343 -6.725541
#> [3,] -2.304449 -9.296515 -6.729604
#> [4,] -2.739545 -9.698656 -6.181556
#> [5,] -2.331803 -8.904005 -5.325790
cat("Estimated parameters: beta0 =", result_yeast$beta0, ", beta1 =", result_yeast$beta1, "\n")
#> Estimated parameters: beta0 = 3.513943 , beta1 = -1.410595
# Run analysis on the example contact matrix
result_su <- hispa_analyze(
hic_experiment = su1_contact_mat,
output_dir = tempdir(),
mcmc_iterations = 2000,
use_cluster_init = TRUE,
mcmc_burn_in = 10,
filter_quantile = 0.1, # Filter out loci in the bottom 10% of contact counts
verbose = TRUE
)
#> Filtering loci by contact count:
#> Original number of loci: 649
#> Threshold ( 0.1 quantile): 13595.8
#> Loci removed: 65
#> Final number of loci: 584
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> Warning: stack imbalance in '<-', 51 then 53
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 584 x 584
#> Output directory: /private/var/folders/5n/zqmvmhlj18xc8zrc4m8cljv80000gn/T/RtmpGI662s
#>
#> Contact matrix set successfully: 584 x 584
#> Skip zero-contact loci: enabled
#> Sample from prior positions: disabled
#>
#> === Pre-processing ===
#> Assigning clusters...
#> Building cluster relationships...
#>
#> === Structure Initialization ===
#> Initializing structure from individual cluster MCMC...
#> Cluster MCMC iterations: 1000
#> Cluster initial SD: 0.1
#> --- Finished Initializing All Cluster and Backbone Structures ---
#> Assembling global structure...
#>
#> --- Assembling Global Structure ---
#> --- Global Assembly Complete ---
#> Initial log-likelihood: 4.4342e+07
#> Position matrix has shape: 584 x 3
#>
#> === Main MCMC Algorithm ===
#> Running MCMC with 2000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#>
#> --- Starting MCMC Sampling ---
#> Iter 1000/2000. LL: 4.70813e+07. Best LL: 4.70813e+07
#> Iter 2000/2000. LL: 4.71426e+07. Best LL: 4.71426e+07
#> --- MCMC Finished ---
#> Final max log-likelihood found: 4.71426e+07
#> Beta0 acceptance rate: 23.45%
#> Beta1 acceptance rate: 22.65%
#> Center pos acceptance rate: 22.9583%
#> Locus pos acceptance rate: 24.7136%
#>
#> === Analysis Complete ===
#> All results saved:
#> - final_positions.txt
#> - initial_positions.txt
#> - log_likelihood_trace.txt
#> - block_timings.txt
#> - mcmc_log.txt
#> Warning: stack imbalance in '{', 47 then 49
#> Warning: stack imbalance in '{', 26 then 28
# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_su)
#> List of 3
#> $ positions: num [1:584, 1:3] 2.24 2.13 2.12 2.09 1.99 ...
#> ..- attr(*, "filtered_locus_indices")= int [1:584] 7 10 11 12 13 14 15 16 17 19 ...
#> $ beta0 : num 3.4
#> $ beta1 : num -1.72
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_su$positions, 5))
#> [,1] [,2] [,3]
#> [1,] 2.236037 -3.263358 -1.401102
#> [2,] 2.128010 -3.073882 -1.274396
#> [3,] 2.122477 -2.963711 -1.328182
#> [4,] 2.089342 -2.866634 -1.324100
#> [5,] 1.992485 -2.841021 -1.297008
cat("Estimated parameters: beta0 =", result_su$beta0, ", beta1 =", result_su$beta1, "\n")
#> Estimated parameters: beta0 = 3.396233 , beta1 = -1.724735
mcmc_iterations: Total number of MCMC iterations (default: 6000)use_cluster_init: Use hierarchical clustering for initialization (recommended for large datasets)filter_quantile: Remove loci below this quantile of contact counts (default: -1, no filtering). Use 0.1 to remove bottom 10%mcmc_burn_in: Number of burn-in iterations to discard (default: 0)We recommend using the plotly package for interactive 3D visualization of the inferred chromatin structures:
coords_yeast <- result_yeast$positions
# Create interactive 3D scatter plot
plot_ly(
x = coords_yeast[,1],
y = coords_yeast[,2],
z = coords_yeast[,3],
type = 'scatter3d',
mode = 'markers+lines',
marker = list(size = 5, color = ~seq_len(nrow(coords_yeast)), showscale = TRUE),
) %>%
layout(
title = "Inferred 3D Chromatin Structure",
scene = list(
xaxis = list(title = "X"),
yaxis = list(title = "Y"),
zaxis = list(title = "Z")
)
)
# Extract coordinates from results
coords_su <- result_su$positions
# Create interactive 3D scatter plot
plot_ly(
x = coords_su[,1],
y = coords_su[,2],
z = coords_su[,3],
type = 'scatter3d',
mode = 'markers+lines',
marker = list(size = 5, color = ~seq_len(nrow(coords_su)), showscale = TRUE),
) %>%
layout(
title = "Inferred 3D Chromatin Structure",
scene = list(
xaxis = list(title = "X"),
yaxis = list(title = "Y"),
zaxis = list(title = "Z")
)
)
All results are automatically saved to the specified output directory:
final_positions.txt: Final inferred 3D coordinates (n × 3 matrix)initial_positions.txt: Initial positions before MCMCsessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: aarch64-apple-darwin24.4.0
#> Running under: macOS Tahoe 26.2
#>
#> Matrix products: default
#> BLAS: /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib
#> LAPACK: /opt/homebrew/Cellar/r/4.5.2_1/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] plotly_4.12.0 ggplot2_4.0.1 HiContactsData_1.12.0
#> [4] ExperimentHub_3.0.0 AnnotationHub_4.0.0 BiocFileCache_3.0.0
#> [7] dbplyr_2.5.1 BiocGenerics_0.56.0 generics_0.1.4
#> [10] HiContacts_1.12.0 HiCExperiment_1.10.0 HiSpaR_0.99.0
#> [13] BiocStyle_2.38.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 httr2_1.2.2
#> [3] rlang_1.1.7 magrittr_2.0.4
#> [5] otel_0.2.0 matrixStats_1.5.0
#> [7] compiler_4.5.2 RSQLite_2.4.5
#> [9] callr_3.7.6 png_0.1-8
#> [11] vctrs_0.7.1 stringr_1.6.0
#> [13] pkgconfig_2.0.3 crayon_1.5.3
#> [15] fastmap_1.2.0 XVector_0.50.0
#> [17] labeling_0.4.3 rmarkdown_2.30
#> [19] tzdb_0.5.0 ps_1.9.1
#> [21] ggbeeswarm_0.7.3 strawr_0.0.92
#> [23] tinytex_0.58 purrr_1.2.1
#> [25] bit_4.6.0 xfun_0.56
#> [27] cachem_1.1.0 jsonlite_2.0.0
#> [29] blob_1.3.0 rhdf5filters_1.22.0
#> [31] DelayedArray_0.36.0 Rhdf5lib_1.32.0
#> [33] BiocParallel_1.44.0 parallel_4.5.2
#> [35] R6_2.6.1 bslib_0.10.0
#> [37] stringi_1.8.7 RColorBrewer_1.1-3
#> [39] GenomicRanges_1.62.1 jquerylib_0.1.4
#> [41] Rcpp_1.1.1 Seqinfo_1.0.0
#> [43] bookdown_0.46 SummarizedExperiment_1.40.0
#> [45] knitr_1.51 readr_2.1.6
#> [47] IRanges_2.44.0 Matrix_1.7-4
#> [49] tidyselect_1.2.1 abind_1.4-8
#> [51] yaml_2.3.12 codetools_0.2-20
#> [53] processx_3.8.6 curl_7.0.0
#> [55] lattice_0.22-7 tibble_3.3.1
#> [57] InteractionSet_1.38.0 Biobase_2.70.0
#> [59] withr_3.0.2 KEGGREST_1.50.0
#> [61] S7_0.2.1 evaluate_1.0.5
#> [63] ggrastr_1.0.2 Biostrings_2.78.0
#> [65] pillar_1.11.1 BiocManager_1.30.27
#> [67] filelock_1.0.3 MatrixGenerics_1.22.0
#> [69] stats4_4.5.2 vroom_1.7.0
#> [71] BiocVersion_3.22.0 S4Vectors_0.48.0
#> [73] hms_1.1.4 scales_1.4.0
#> [75] glue_1.8.0 lazyeval_0.2.2
#> [77] tools_4.5.2 BiocIO_1.20.0
#> [79] data.table_1.18.2.1 RSpectra_0.16-2
#> [81] Cairo_1.7-0 rhdf5_2.54.1
#> [83] grid_4.5.2 tidyr_1.3.2
#> [85] crosstalk_1.2.2 AnnotationDbi_1.72.0
#> [87] beeswarm_0.4.0 vipor_0.4.7
#> [89] cli_3.6.5 rappdirs_0.3.4
#> [91] viridisLite_0.4.2 S4Arrays_1.10.1
#> [93] dplyr_1.1.4 gtable_0.3.6
#> [95] sass_0.4.10 digest_0.6.39
#> [97] SparseArray_1.10.8 htmlwidgets_1.6.4
#> [99] farver_2.1.2 memoise_2.0.1
#> [101] htmltools_0.5.9 lifecycle_1.0.5
#> [103] httr_1.4.7 bit64_4.6.0-1